The epidemiology of diphtheria in Haiti, December 2014–June 2021: A spatial modeling analysis

Background Haiti has been experiencing a resurgence of diphtheria since December 2014. Little is known about the factors contributing to the spread and persistence of the disease in the country. Geographic information systems (GIS) and spatial analysis were used to characterize the epidemiology of diphtheria in Haiti between December 2014 and June 2021. Methods Data for the study were collected from official and open-source databases. Choropleth maps were developed to understand spatial trends of diphtheria incidence in Haiti at the commune level, the third administrative division of the country. Spatial autocorrelation was assessed using the global Moran’s I. Local indicators of spatial association (LISA) were employed to detect areas with spatial dependence. Ordinary least squares (OLS) and geographically weighted regression (GWR) models were built to identify factors associated with diphtheria incidence. The performance and fit of the models were compared using the adjusted r-squared (R2) and the corrected Akaike information criterion (AICc). Results From December 2014 to June 2021, the average annual incidence of confirmed diphtheria was 0.39 cases per 100,000 (range of annual incidence = 0.04–0.74 per 100,000). During the study period, diphtheria incidence presented weak but significant spatial autocorrelation (I = 0.18, p<0.001). Although diphtheria cases occurred throughout Haiti, nine communes were classified as disease hotspots. In the regression analyses, diphtheria incidence was positively associated with health facility density (number of facilities per 100,000 population) and degree of urbanization (proportion of urban population). Incidence was negatively associated with female literacy. The GWR model considerably improved model performance and fit compared to the OLS model, as indicated by the higher adjusted R2 value (0.28 v 0.15) and lower AICc score (261.97 v 267.13). Conclusion This study demonstrates that GIS and spatial analysis can support the investigation of epidemiological patterns. Furthermore, it shows that diphtheria incidence exhibited spatial variability in Haiti. The disease hotspots and potential risk factors identified in this analysis could provide a basis for future public health interventions aimed at preventing and controlling diphtheria transmission.


Introduction
Diphtheria is a highly contagious, vaccine-preventable disease caused by Corynebacterium diphtheriae [1][2][3]. Transmission occurs primarily by droplet or contact with nasopharyngeal secretions of infected people. The hallmark of infection is the formation in the upper respiratory tract of the pseudomembrane-a thick, gray coating consisting of necrotic tissue and bacteria [1][2][3]. Diphtheria complications include respiratory insufficiency, myocarditis, and neuritis. The fatality rate among confirmed cases is 5-10%. However, higher rates have been observed among certain groups (e.g., untreated, unvaccinated individuals) [1][2][3].
In recent years, despite the existence of a safe and effective vaccine, diphtheria has been experiencing a dramatic resurgence worldwide, with major outbreaks reported in Bangladesh, Venezuela, and Yemen [4][5][6]. In 2019 alone, 22,625 cases were reported globally-a 407% increase from 2015, when 4,535 infections were recorded [6]. The situation is exacerbated by the current shortage of the life-saving diphtheria antitoxin, resulting from a decline in production due to decreasing demand [7,8].
Presently, Haiti is among the countries worst hit by the disease in the Americas. From December 2014 to June 2021, 1,281 suspected cases were detected in the country [9]. Past research has shown inadequate levels of diphtheria immunization among confirmed cases in Haiti [10][11][12]. Nevertheless, little is known about other factors contributing to the spread and persistence of the disease in the country. Moreover, areas at high risk for infection remain unknown. Understanding the spatial patterns of diphtheria transmission and the associated factors is critical for developing and implementing effective interventions.
Over the last two decades, geographic information systems (GIS) and spatial analysis have emerged as key tools for detecting disease hotspots and identifying factors correlated with disease transmission [13,14]. Few studies have employed GIS and spatial analysis to examine diphtheria. For instance, Podavalenko [15] detected a significant correlation between diphtheria incidence and vaccination coverage, population density, and population growth rate in Ukraine during 1985-2016. Nailul et al. [16] also identified a negative association between diphtheria incidence and vaccination coverage in East Java, Indonesia in 2010. Furthermore, Quesada [17] found that diphtheria incidence was associated with poverty rates during an outbreak in San Antonio, Texas in 1970.
The present study set out to characterize the spatial epidemiology of diphtheria in Haiti from December 2014 to June 2021. Specifically, it aimed to determine the subnational distribution of confirmed cases in the country; locate hotspots of transmission; and identify potential factors associated with the incidence of the disease.

Study area
Haiti (19.00˚N latitude, 72.25˚W longitude) is situated on the western third of Hispaniola, an island in the Caribbean Sea that it shares with the Dominican Republic [18,19]. It is divided into 10 departments consisting of 42 arrondissements, 140 communes, and 570 communal sections. The capital and largest city is Port-au-Prince, which is in the Ouest department. Haiti's population is estimated at about 11 million.

Study design
The study was a retrospective ecological analysis of confirmed diphtheria cases reported to Haiti's Directorate of Epidemiology, Laboratory and Research (Direction d' épidémiologie, des laboratoires et de la recherche; DELR)-a body of Haiti's Ministry of Public Health and Population (Ministère de la santé publique et de la population; MSPP) that records, reviews, and validates data of all diphtheria cases reported in the country. In this study, a confirmed case was defined as an individual who tested positive for C. diphtheriae by polymerase chain reaction (PCR) or who was confirmed by epidemiological link. The geographical unit of analysis was the commune. The period under consideration was from 1 st December 2014 to 30 th June 2021.

Data dictionary
The number of diphtheria cases at the commune level were obtained from the DELR. Crude annual rates by communes were calculated by dividing the number of diphtheria cases reported annually by the corresponding population estimate from the Haitian Institute of Statistics and Informatics (Institut haïtien de statistique et d'informatique; IHSI) [18]. Average rates were calculated by dividing the sum of the total cases reported during the study period by the sum of the populations for the same period. All rates were multiplied by 100,000. Eleven factors which could be linked to diphtheria incidence were selected following a systematic literature review [20]. These were grouped under three domains: health, socioeconomic status, and environment. Table 1 summarizes the study variables.
Data for most of these variables were extracted from spatially interpolated maps produced by the Demographic and Health Survey (DHS) Program [22]. The maps were freely available as raster files on the DHS Program Spatial Data Repository. The maps were based on a 2016-2017 survey of a nationally representative sample of 13,405 households in Haiti [23]. Using a simple mean approach, datapoints in the maps were aggregated to match the boundaries of each commune using R programming language [24]. Spatial data relative to administrative boundaries and health facilities in Haiti were retrieved from Humanitarian Data Exchange (HDX)-an open access platform managed by the United Nations Office for the Coordination of Humanitarian Affairs [21]. Other data sources included the MSPP and the IHSI. location (i.e., mean, median) and variation (i.e., standard deviation, range, interquartile range) were calculated for continuous variables. Choropleth maps were developed to illustrate the geographic distribution of the study variables. QGIS [25] was used to process data while the descriptive analysis was performed using the R programming language.
Two variables (DT vaccine stockout and DTP vaccine stockout) were excluded from the analysis due to the large amount of missing data (>10%). Out-of-range values were found for DTP3 vaccine coverage; however, since these values represented <10% of the total number of observations, the variable was included in the analysis. No duplicate values were found in the dataset.

Spatial autocorrelation and hotspot analysis
Spatial autocorrelation analyses were conducted to investigate the spatial pattern of diphtheria incidence and identify hotspots. The global spatial test Moran's I was used to quantify the spatial autocorrelation of diphtheria incidence in Haiti. The Moran's I is an index that measures the extent of spatial autocorrelation in a given dataset using a scale from -1 to +1 [26,27]. A positive Moran's I suggested positive autocorrelation (i.e., the clustering of communes with similar values). A negative Moran's I denoted negative autocorrelation (i.e., the clustering of communes with dissimilar values). A Moran's I close to 0 indicated that values were randomly distributed.
Since the global Moran's I revealed the overall degree and direction of spatial autocorrelation but not where the clustering of high and low values occurred, local indicators of spatial association (LISA) were also calculated. LISA are a local version of the Moran's I, in which the

PLOS ONE
The epidemiology of diphtheria in Haiti, December 2014-June 2021: A spatial modeling analysis level of spatial clustering is assessed around each individual geographical unit (e.g., commune) rather than across the entire study area (e.g., Haiti) [28]. In this study, neighbour relationships were defined using a first-order Queen's contiguity method, in which only communes that shared common boundaries were considered to be neighbours. If a commune was situated on an island and, thus, did not share borders with the rest of the study area, these were assigned manually to one of the nearest communes on mainland Haiti [29]. The main output of the LISA analysis was a map showing four types of statistically significant spatial autocorrelation [28]: high-high to indicate the clustering of communes with high diphtheria incidence (i.e., the hotspots); low-low to show the clustering of communes with low incidence (i.e., the cold spots); and, low-high and high-low to represent spatial outliers (i.e., low incidence communes surrounded by high incidence communes, and vice versa). All spatial analyses were conducted in GeoDa 1.12 [30]. The level of significance was set at p<0.05. Significance of spatial tests was evaluated by comparing the observed test results with the expected results under the complete spatial randomness assumption using Markov chain Monte Carlo (MCMC) method based on 999 permutations [31].

Regression models
To identify the significant correlates of diphtheria incidence, two regression models were built: ordinary least squares (OLS) and geographically weighted regression (GWR). OLS is a global model which presumes that observations are mutually independent and that relations between dependent and independent variables are constant across a study area. When these assumptions are violated, global models are no longer effective. OLS is defined as [32]: where Y is the dependent variable, X is the independent variable, β is the coefficient explaining the strength and type of relationship between X and Y, and ε is the residual (i.e., the difference between observed and predicted values).
In contrast with OLS, GWR is a local model that accounts for spatial heterogeneity by generating a unique equation for every unit of a study area [33,34]. Each equation is calibrated based on their neighbouring units, which are weighted using a decreasing function of distance; in other words, nearby areas hold a greater weight than those farther away. The assumption is that everything is related to everything else, but near things are more related than distant things (i.e., Tobler's first law of geography) [35]. GWR can be defined as: in which i is the specific location where data on Y and X are measured. Independent variables to be included in the two models were identified using a multi-stage process to ensure the absence of multicollinearity, which occurs when independent variables are highly correlated among each other [36]. Firstly, Spearman's rank correlation was conducted to identify strong correlations (r�0.7, p�0.05). If two or more independent variables were highly correlated, the one with the lowest correlation with diphtheria incidence was excluded. Then, the remaining variables were included in the OLS model. Finally, the variance inflation factor (VIF) was calculated to determine the degree of multicollinearity among the independent variables. A VIF�5 was considered acceptable. Variables that did not have a statistically significant (p>0.1) effect on diphtheria incidence were removed from the model.
The performance of the OLS and GWR models was compared using the adjusted r-squared (R 2 ) and the corrected Akaike information criterion (AIC c ). R 2 is the coefficient of determination, which indicates the proportion of variance in the dependent variable that is collectively explained by the independent variables [37]. A drawback of R 2 is that it increases with the number of added variables. The adjusted R 2 is similar to the ordinary R 2 , but it imposes a penalty as superfluous variables are included in the model. AIC c is a modified version of the Akaike information criterion (AIC), a comparative measure of goodness-of-fit that takes into account model complexity [38]. AIC is obtained by the sum of twice the negative log-likelihood and twice the number of parameters in the model. Lower AIC scores are indicative of higher efficiency (i.e., models that explain a greater amount of variation using fewer parameters). AIC c is equivalent to AIC but with a correction for small sample sizes.
Results output from the GWR model were used to create surface maps of the R 2 values and local coefficients of each independent variable to explore the spatial variation in the relationship between diphtheria incidence and the selected parameters. All regression models and surface maps were developed using the R programming language.

Descriptive analysis
From December 2014 to August 2021, 392 confirmed diphtheria cases were recorded in Haiti ( Table 2). Most of the cases were female (n = 215; 54.8%) and aged �14 years old (n = 343; 87.5%). Only 59 cases (15.1%) were reported to be vaccinated against diphtheria, which was defined as having received at least three doses of a diphtheria vaccine.
During the study period, the annual incidence of diphtheria varied greatly, going from 0.04 cases per 100,000 population in 2014 to 0.74 per 100,000 in 2018 (Fig 1). This peak was followed by a three-year decline in reported infection rates.
Information on the commune of origin was not available for two of the 392 cases. As Fig 2  shows, the outbreak appeared to originate in the Ouest department and to have gradually spread to the rest of the country. Between 2014 and 2015, detection of diphtheria cases remained limited to 21 communes across five departments located in central and northern Haiti. By 2021, cases had been reported in 79 communes, encompassing nine departments. Grand'Anse was the only department to report no confirmed cases throughout the study period. Four departments (i.e., Artibonite, Centre, Nord, and Ouest) accounted for 84% of all confirmed cases. Ouest was the only department to report cases each year.

PLOS ONE
The epidemiology of diphtheria in Haiti, December 2014-June 2021: A spatial modeling analysis Spatial autocorrelation and hotspot analysis. The global Moran's I test found modest but statistically significant spatial autocorrelation (I = 0.18, p < 0.001). This suggests that, during the study period, diphtheria incidence was more similar in certain neighbouring communes than would be expected by chance.
The LISA analysis revealed nine communes, home to an estimated 646,346 people (4.7% of the population of Haiti), that can be classified as diphtheria hotspots (Fig 3). Furthermore, one high-low commune (i.e., a high incidence commune surrounded by areas of low diphtheria incidence) was found in the Sud department. An estimated 35,139 people (0.3% of the population) live in this high-low commune. Additionally, the analysis identified 14 cold spots and six low-high outliers (i.e., low incidence communes surrounded by areas of high diphtheria incidence). S1 Appendix lists the identified areas with spatial dependence.

Regression models
The Spearman's rank correlation analysis found that male literacy and female literacy were highly correlated (r = 0.78, p<0.001). Consequently, male literacy was excluded from the pool of independent variables as it did not have a significant correlation with diphtheria incidence (p = 0.18). Low collinearity was observed among the remaining variables (VIF range = 1.18-2.22). Table 3 presents the results of the regression analyses. In the final OLS model, health facility density and the degree of urbanization were positively associated with diphtheria incidence. Specifically, for every one-unit increase in health facilities per 100,000 population, the rate of diphtheria cases per 100,000 population was reported to increase by 0.020. Similarly, a oneunit increase in the proportion of population who lives in urban areas led to a 0.009 increase in the rate of diphtheria cases per 100,000. Conversely, a negative association was observed with female literacy. A one-unit increase in female literacy rate was found to decrease the rate

PLOS ONE
The epidemiology of diphtheria in Haiti, December 2014-June 2021: A spatial modeling analysis of diphtheria cases per 100,000 by 0.030. The adjusted R 2 for the final OLS model was 0.15, which indicates that the model explains 15% of the variance seen in diphtheria incidence. The R 2 value suggest a weak model fit and explanation of variance. The AIC c score was 267.13.
The GWR model incorporated the same variables as the final OLS model. There was agreement between the OLS and GWR model on the direction of the influence of the selected independent variables on diphtheria incidence. Furthermore, the effect sizes for the independent variables were the same in the two models. However, the GWR model considerably improved model performance and fit compared to the final OLS model, as indicated by the higher adjusted R 2 value (0.28) and lower AIC c score (261.97). These results suggest that, by accommodating spatial non-stationarity and allowing variables to vary in space, the GWR model is

PLOS ONE
The epidemiology of diphtheria in Haiti, December 2014-June 2021: A spatial modeling analysis better than the OLS model at explaining the relationship between diphtheria incidence and other factors.

PLOS ONE
The epidemiology of diphtheria in Haiti, December 2014-June 2021: A spatial modeling analysis (range = 0.01-0.35) indicates that the level of explanatory power of the GWR model varies significantly throughout the territory, with higher local R 2 values found in as many as six different departments.

Discussion
This study has shown that the reported incidence of the disease varied considerably between December 2014 and June 2021, reaching a peak in 2018. The investigation has identified areas with spatial dependence, which suggests that certain communes in Haiti may have

PLOS ONE
The epidemiology of diphtheria in Haiti, December 2014-June 2021: A spatial modeling analysis predisposing factors increasing the risk of diphtheria transmission. This hypothesis is supported by findings from the GWR model, which have demonstrated that at the commune-level 28% of the variability in diphtheria incidence in Haiti could be explained by a combination of three factors: health facility density, the degree of urbanization, and female literacy. The sharp increase in incidence in the early stages of the outbreak indicates that a large proportion of the population in Haiti was susceptible to diphtheria. This is consistent with the results of Minta et al. [39], who found no evidence of long-term protection against the infection (IgG�1 IU/mL) among a nationally representative sample of 1,146 children aged 5-7 years in Haiti in 2017. There are a few probable explanations for the decrease in incidence after 2018. That year, the MSPP conducted a mass vaccination campaign that saw more than two million children aged 1-14 years receiving at least one dose of a diphtheria vaccine [39,40]. It is reasonable to assume that the campaign contributed to reducing the size of susceptible individuals, ultimately driving down the incidence of the disease. Nevertheless, the decline in incidence may have also been partly a surveillance artifact. Since 2019, there has been a dramatic surge in politically motivated protests and civil unrest, which has been accompanied by high levels of gang-related violence throughout Haiti [41]. This period has also coincided with the emergence of COVID-19 [42]. The two crises have paralyzed the country for long periods of time, making it more difficult for people in need to access medical care and for health authorities to conduct basic surveillance activities, such as case investigation and contact tracing. As a result, several diphtheria cases may have gone undetected, which suggests that available figures likely underestimate the disease's true spread.
By characterizing the spatial distribution of detected diphtheria incidence, we have shown that the disease has spread widely across Haiti. Nevertheless, substantial heterogeneities in diphtheria incidence exist from one department to another and between communes within the same department. The LISA analysis brought to light a spectrum of diphtheria dynamics that includes several areas with spatial dependence. An estimated 646,346 people (4.7% of the population of Haiti) are living in diphtheria hotspots. Interestingly, some of the identified hotspots are located near the border with the Dominican Republic, which has reported diphtheria cases in recent years [9]. This indicates that close collaboration between the two countries, especially on cross-border surveillance, would be crucial to control the transmission of diphtheria on the Hispaniola island. The hotspots detected in this study could be prioritized for targeted public health interventions, including raising people's awareness about diphtheria and preventive measures through community health workers, training clinical personnel periodically, and increasing the capacity for laboratory testing. All these interventions have shown promise in the response to other public health issues in Haiti [43][44][45]. However, given that the full implementation of these measures will require considerable investment and time, vaccination continues to be the most vital tool in the fight against diphtheria.
The associations of diphtheria incidence with health facility density, degree of urbanization, and female literacy were somewhat expected. In areas with a high number of clinics and hospitals, the probability of detecting a diphtheria case is higher than elsewhere because of the increased access to healthcare services [46,47]. Urban areas are generally characterized by overcrowding as well as high population mobility and inter-mixing, all of which increase the opportunities for infectious diseases, like diphtheria, to spread [48,49]. Literate women might comprehend health messages better than illiterate women, which makes them more likely to take protective measures (e.g., vaccination and personal hygiene) for themselves and for their children [50,51]. These findings add to existing evidence that health outcomes are shaped by factors beyond healthcare [52,53].
The coefficient estimates of the GWR model highlighted spatial variations in the relationships between diphtheria incidence and the three independent variables. This suggests that the level of influence of each independent variable on diphtheria incidence might have varied from one commune to another. Gaining these local-level insights simply would have not been possible using global OLS techniques. These findings should be complemented by qualitative studies to understand why and how the interrelationships between diphtheria incidence and the independent variables differ across Haiti. Such investigations might help to better explain the observed differences in diphtheria incidence.
Of note in our results is the lack of association between diphtheria incidence and risk factors related to vaccination, especially given that just 15% of the confirmed cases in this study were reported to be vaccinated against diphtheria. Past research has highlighted several issues related to vaccination coverage measurements, including coverage estimates sometimes exceeding 100%, improbable year-to-year variations, and epidemics in areas reporting high coverage [54]. These issues can be linked to weaknesses in immunization information systems (IIS) and inaccuracies in vaccination coverage denominators. Unfortunately, Haiti faces both problems. A multi-country evaluation from 2009 found major flaws in the national IIS [55]. It is probable that some of those inadequacies are still present today. Furthermore, Haiti's vaccination coverage estimates are unlikely to be accurate as they are based on population projections-the last official census dates back to 2003 [56]. It is, thus, plausible that inadequate vaccination contributes to the propagation of diphtheria in the country, though this cannot be demonstrated through this study.
A number of limitations may have affected our findings. Although diphtheria is a nationally notifiable disease in Haiti, some underreporting by physicians may still occur for a variety of reasons, including misdiagnosis. Additionally, asymptomatic cases and symptomatic individuals who did not seek medical care may have gone unreported. Consequently, notified cases may not necessarily reflect the actual incidence of diphtheria. Moreover, data for the examined variables were from different time periods, which reduces the reliability of the regression estimates. Furthermore, data on certain risk factors known to correlate with diphtheria were unavailable (e.g., level of wealth, knowledge of diphtheria) [20], impeding further analysis. Additionally, as our models were based on aggregated data, there is a risk of ecological fallacy, which consists in assuming that associations observed at the commune level will necessarily hold at the individual level [57]. Finally, like other analytic methods, GWR has some drawbacks: its spatial weighting function accounts for geographical distance but ignores the attributes of the observations [58]; local multicollinearity may be present in a GWR model, even if the independent variables are not collinear at the global level [59]. Given these limitations, alternative approaches have been proposed, including conditional autoregressive (CAR) models, simultaneous autoregressive (SAR) models, and Bayesian hierarchical models [59,60].
To our knowledge, this is the first study that describes the epidemiology of diphtheria in Haiti using GIS and spatial analysis. The study has shown that GWR is a useful technique for exploratory and descriptive data analysis, which not only improves on the OLS performance but enables the discovery of hidden spatial relationships between variables. This investigation has also demonstrated that between 2014 and 2021 diphtheria exhibited spatial variability in Haiti, with the clustering of high and low incidence areas. The hotspots detected in this analysis could serve as a basis for prioritizing and targeting response activities. The baseline estimates of diphtheria incidence presented in this paper could guide surveillance activities and help track progress in the control of the disease. Further research and continued monitoring of the factors found to be associated with diphtheria incidence could help us better understand the spread of the disease.